{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Datashader dashboard\n",
    "\n",
    "This notebook contains the code for an interactive dashboard for making [Datashader](http://datashader.org) plots from any dataset that has latitude and longitude (geographic) values. Apart from Datashader itself, the code relies on other Python packages from the [PyViz](http://pyviz.org) project that are each designed to make it simple to:\n",
    "\n",
    "- lay out plots and widgets into an app or dashboard, in a notebook or for serving separately ([Panel](http://panel.pyviz.org))\n",
    "- build interactive web-based plots without writing JavaScript ([Bokeh](http://bokeh.pydata.org))\n",
    "- build interactive Bokeh-based plots backed by Datashader, from concise declarations ([HoloViews](http://holoviews.org))\n",
    "- express dependencies between parameters and code to build reactive interfaces declaratively ([Param](http://param.pyviz.org))\n",
    "- describe the fields and plotting information needed to plot a dataset, in a text file ([Intake](http://intake.readthedocs.io))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os, colorcet, param as pm, holoviews as hv, panel as pn, datashader as ds\n",
    "import intake, geoviews.tile_sources as gts\n",
    "from holoviews.operation.datashader import rasterize, shade, spread\n",
    "from collections import OrderedDict as odict\n",
    "\n",
    "hv.extension('bokeh', logo=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can run the dashboard here in the notebook with various datasets by editing the `dataset` below to specify some dataset defined in `dashboard.yml`.  You can also launch a separate, standalone server process in a new browser tab with a command like:\n",
    "\n",
    "```\n",
    "DS_DATASET=nyc_taxi panel serve --show dashboard.ipynb\n",
    "```\n",
    "\n",
    "(Where `nyc_taxi` can be replaced with any of the available datasets (`nyc_taxi`, `nyc_taxi_50k` (tiny version), `census`, `opensky`, `osm-1b`) or any dataset whose description you add to `dashboard.yml`). To launch multiple dashboards at once, you'll need to add `-p 5001` (etc.) to select a unique port number for the web page to use for communicating with the Bokeh server.  Otherwise, be sure to kill the server process before launching another instance.\n",
    "\n",
    "For most of these datasets, if you have less than 16GB of RAM on your machine, you should remove the `.persist()` method call below, to tell [Dask](http://dask.pydata.org) to work out of core instead of loading all data into memory.  However, doing so will make interactive use substantially slower than if sufficient memory were available."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset = os.getenv(\"DS_DATASET\", \"nyc_taxi\")\n",
    "catalog = intake.open_catalog('dashboard.yml')\n",
    "source  = getattr(catalog, dataset)\n",
    "source.to_dask().persist();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The Intake `source` object lets us treat data in many different formats the same in the rest of the code here. We can now build a class that captures some parameters that the user can vary along with how those parameters relate to the code needed to update the displayed plot of that data source:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plots  = {source.metadata['plots'][p].get('label',p):p for p in source.plots}\n",
    "fields = odict([(v.get('label',k),k) for k,v in source.metadata['fields'].items()])\n",
    "field  = next(iter(fields.items()))[1]\n",
    "aggfns = odict([(f.capitalize(),getattr(ds,f)) for f in ['count','sum','min','max','mean','var','std']])\n",
    "\n",
    "norms  = {'Histogram_Equalization': 'eq_hist', 'Linear': 'linear', 'Log': 'log', 'Cube root': 'cbrt'}\n",
    "cmaps  = {n: colorcet.palette[n] for n in ['fire', 'bgy', 'bgyw', 'bmy', 'gray', 'kbc']}\n",
    "\n",
    "maps   = ['CartoMidnight', 'StamenWatercolor', 'StamenTonerBackground', 'EsriImagery', 'EsriUSATopo', 'EsriTerrain']\n",
    "bases  = {name: ts.relabel(name) for name, ts in gts.tile_sources.items() if name in maps}\n",
    "\n",
    "gopts  = hv.opts.WMTS(width=800, height=650, xaxis=None, yaxis=None, bgcolor='black', show_grid=False)\n",
    "\n",
    "class Explorer(pm.Parameterized):\n",
    "    plot          = pm.ObjectSelector( precedence=0.10, default=source.plots[0], objects=plots)\n",
    "    field         = pm.ObjectSelector( precedence=0.11, default=field,           objects=fields)\n",
    "    agg_fn        = pm.ObjectSelector( precedence=0.12, default=ds.count,        objects=aggfns)\n",
    "    \n",
    "    normalization = pm.ObjectSelector( precedence=0.13, default='eq_hist',       objects=norms)\n",
    "    cmap          = pm.ObjectSelector( precedence=0.14, default=cmaps['fire'],   objects=cmaps)\n",
    "    spreading     = pm.Integer(        precedence=0.16, default=0,               bounds=(0, 5))\n",
    "    \n",
    "    basemap       = pm.ObjectSelector( precedence=0.18, default=bases['EsriImagery'], objects=bases)\n",
    "    data_opacity  = pm.Magnitude(      precedence=0.20, default=1.00, doc=\"Alpha value for the data\")\n",
    "    map_opacity   = pm.Magnitude(      precedence=0.22, default=0.75, doc=\"Alpha value for the map\")\n",
    "    show_labels   = pm.Boolean(        precedence=0.24, default=True)\n",
    "    \n",
    "    @pm.depends('plot')\n",
    "    def elem(self):\n",
    "        return getattr(source.plot, self.plot)()\n",
    "\n",
    "    @pm.depends('field', 'agg_fn')\n",
    "    def rasterize(self, element, x_range=None, y_range=None):\n",
    "        field = None if self.field == \"counts\" else self.field\n",
    "        return rasterize(element, width=800, height=600, aggregator=self.agg_fn(field),\n",
    "                         x_range=x_range, y_range=y_range, dynamic=False)\n",
    "\n",
    "    @pm.depends('map_opacity','basemap')\n",
    "    def tiles(self):\n",
    "        return self.basemap.opts(gopts).opts(alpha=self.map_opacity)\n",
    "\n",
    "    @pm.depends('show_labels')\n",
    "    def labels(self):\n",
    "        return gts.StamenLabels.options(level='annotation', alpha=1 if self.show_labels else 0)\n",
    "    \n",
    "    @pm.depends('data_opacity')\n",
    "    def apply_opacity(self, shaded):\n",
    "        return shaded.opts(alpha=self.data_opacity, show_legend=False)\n",
    "\n",
    "    def viewable(self,**kwargs):\n",
    "        data_dmap  = hv.DynamicMap(self.elem)\n",
    "        rasterized = hv.util.Dynamic(data_dmap, operation=self.rasterize, streams=[hv.streams.RangeXY])\n",
    "        \n",
    "        c_stream   = hv.streams.Params(self, ['cmap', 'normalization'])\n",
    "        s_stream   = hv.streams.Params(self, ['spreading'], rename={'spreading': 'px'})\n",
    "        shaded     = spread(shade(rasterized, streams=[c_stream]), streams=[s_stream], how=\"add\")\n",
    "        shaded     = hv.util.Dynamic(shaded, operation=self.apply_opacity)\n",
    "        \n",
    "        return hv.DynamicMap(self.tiles) * shaded * hv.DynamicMap(self.labels)\n",
    "\n",
    "explorer = Explorer(name=\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we call the `.viewable` method on the `explorer` object we just created, we'll get a plot that displays itself in a notebook cell.  Moreover, because of how we declared the dependencies between each bit of code and each parameters, the corresponding part of that plot will update whenever one of the parameters is changed on it. (Try putting `explorer.viewable()` in one cell, then set some parameter like `explorer.spreading=4` in another cell.) But since what we want is the user to be able to manipulate the values using widgets, let's go ahead and create a dashboard out of this object by laying out a logo, widgets for the parameters, and the viewable object:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "logo = \"https://raw.githubusercontent.com/pyviz/datashader/master/doc/_static/logo_horizontal_s.png\"\n",
    "\n",
    "panel = pn.Row(pn.Column(logo, pn.Param(explorer.param, expand_button=False)), explorer.viewable())\n",
    "panel.servable()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you are viewing this notebook with a live Python server process running, adjusting one of the widgets above should now automatically update the plot, re-running only the code needed to update that particular item without re-running datashader if that's not needed. It should work the same when launched as a separate server process, but without the extra text and code visible as in this notebook. Here the `.servable()` method call indicates what should be served when run as a separate dashboard with a command like `panel serve --show dashboard.ipynb`, or you can just copy the code out of this notebook into a .py file that will work the same as this .ipynb file."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How it works\n",
    "\n",
    "You can use the code above as is, but if you want to adapt it to your own purposes, you can read on to see how it works. The code has three main components:\n",
    "\n",
    "1. `source`: A dataset with associated metadata managed by [Intake](http://intake.readthedocs.io), which allows this notebook to ignore details like:\n",
    "   - File formats\n",
    "   - File locations\n",
    "   - Column and field names in the data<br><br>\n",
    "   Basically, once the `source` has been defined in the cell starting with `dataset`, this code can treat all datasets the same, as long as their properties have been declared appropriately in the `dashboard.yml` file.<br><br>\n",
    "\n",
    "2. `explorer`: A [Parameterized](http://param.pyviz.org) object that declares:\n",
    "   - What parameters we want the user to be able to manipulate\n",
    "   - How to generate the plot specified by those parameters, using [HoloViews](http://holoviews.org), [GeoViews](http://holoviews.org),  [Datashader](http://datashader.org), and [Bokeh](http://bokeh.pydata.org).\n",
    "   - Which bits of the code need to be run when one of the parameters changes<br><br>\n",
    "   All of these things are declared in a general way that's not tied to any particular GUI toolkit, as long as whatever is returned by `viewable()` is something that can be displayed.<br><br>\n",
    "   \n",
    "3. `panel`: A [Panel](http://panel.pyviz.org)-based app/dashboard consisting of:\n",
    "   - a logo (just for pretty!)\n",
    "   - The user-adjustable parameters of the `explorer` object.\n",
    "   - The viewable HoloViews object defined by `explorer`.\n",
    "\n",
    "You can find out more about how to work with these objects at the websites linked for each one. If you want to start working with this code for your own purposes, parts 1 and 3 should be simple to get started with. You should be able to add new datasets easily to `dashboard.yml` by copying the description of the simplest dataset (e.g. `osm-1b`). If you wish, you can then compare that dataset's description to the other datasets, to see how other fields and metadata can be added if you want there to be more options for users to explore a particular dataset. \n",
    "\n",
    "Similarly, you can easily add additional items to lay out in rows and columns in the `panel` app; it should be trivial to add anything Panel supports (text boxes, images, other separate plots, etc.) to this layout as described at [Panel.org](http://panel.pyviz.org). \n",
    "\n",
    "Part 2 (the `explorer` object) is the hard part to specify, because that's where the complex relationships between the user-visible parameters and the underlying behavior is expressed. Briefly, the `Explorer` class is used to create an instance `explorer` that serves dual purposes. First, it provides an `explorer.param` object that declares what widgets should be made available for the user to manipulate.  Second, it provides an `explorer.viewable` method that returns a displayable object automatically tied to each of those parameters, so that the appropriate part of the plot updates when a parameter is changed. In simple cases you can simply have all computation depend on any parameter, avoiding any complexity by re-running everything. However, Datashader is expected to be used with enormous datasets, so we have chosen to be very careful about not re-running any data-processing code unless it is absolutely necessary.\n",
    "\n",
    "Let's look more closely at `explorer.viewable()` to see how this is done.  What's returned by that method is a [HoloViews](http://holoviews.org) object, in this case an `hv.Overlay` of three components: the underlying tile-based map (like Google Maps), the [datashaded](http://datashader.org) data, and overlaid geographic labels (which also happens to be a tile-based map, but with only text).  If you type `explorer.viewable()` in a cell on its own, you can see that the resulting object is viewable outside of Panel and is still controlled by all the same parameters; Panel just adds visible widgets that let the user change the parameters without writing Python code. \n",
    "\n",
    "To understand this method, first consider a simpler version that doesn't display the data:\n",
    "\n",
    "```\n",
    "def viewable(self,**kwargs):\n",
    "    return hv.DynamicMap(self.tiles) * hv.DynamicMap(self.labels)\n",
    "```\n",
    "\n",
    "Here, `hv.DynamicMap(callback)` returns a dynamic HoloViews object that calls the provided `callback` whenever the object needs updating.  When given a Parameterized method, `hv.DynamicMap` understands the dependency declarations if present.  In this case, the map tiles will thus be updated whenever the `map_opacity` or `basemap` parameters change, and the overlaid labels will be updated whenever the `show_labels` parameter changes (because those are the relationships expressed with `param.depends` above).  The `viewable()` method here returns an overlay (constructed by the `*` syntax for HoloViews objects), retaining the underlying dynamic behavior of the two overlaid items.\n",
    "\n",
    "Still following along? If not, try changing `viewable` to the simpler version shown above and play around with the source code to see how those parts fit together. Once that all makes sense, then we can add in a plot of the actual data:\n",
    "\n",
    "```\n",
    "def viewable(self,**kwargs):\n",
    "    return hv.DynamicMap(self.tiles) * hv.DynamicMap(self.elem) * hv.DynamicMap(self.labels)\n",
    "```\n",
    "\n",
    "Just as before, we use a `DynamicMap` to call the `.elem()` method whenever one of its parameter dependencies changes (`plot` in this case).  Don't actually run this version, though, unless you have a very small dataset (even the tiny `nyc_taxi_50k` may be too large for some browsers).  As written, this code will pass all the data on to your browser, with disastrous results for large datasets!  This is where Datashader comes in; to make it safe for large data, we can instead wrap this object in some HoloViews operations that turn it into something safe to display:\n",
    "\n",
    "```\n",
    "def viewable(self,**kwargs):\n",
    "    return hv.DynamicMap(self.tiles) * spread(shade(rasterize(hv.DynamicMap(self.elem)))) * hv.DynamicMap(self.labels)\n",
    "```\n",
    "\n",
    "This version is now runnable, with `rasterize()` dynamically aggregating the data using Datashader whenever a new plot is needed, `shade()` then dynamically colormapping the data into an RGB image, and `spread()` dynamically spreading isolated pixels so that they become visible data points.  But if you try it, you'll notice that the plot is ignoring all of the rasterization, shading, and spreading parameters we declared above, because those parameters are not declared as dependencies of the `elem` method.  \n",
    "\n",
    "We could add those parameters as dependencies to `.elem`, but if we do that, then the whole set of chained operations will need to be re-run every time any one of those parameters changes. For a large dataset, re-running all those steps can take seconds or even minutes, yet some of the changes only affect the very last (and very cheap) stages of the computation, such as `spread` or `shade`. \n",
    "\n",
    "So, we come to the final version of `viewable()` that's used in the actual class definition above:\n",
    "- first create a `data_dmap` DynamicMap object that updates the HoloViews element when the `plot` parameter changes\n",
    "- then create a DynamicMap `rasterized` that applies the rasterize operation to the `data_dmap` while bringing in the `field` and `agg_fn` parameters\n",
    "- then create a HoloViews \"stream\" `c_stream` that watches the parameters used in colormapping (`cmap` and `normalization`)\n",
    "- then create a HoloViews \"stream\" `s_stream` that watches the parameters used in spreading (`spreading`)\n",
    "- then create a DynamicMap that applies shading and spreading driven by the streams just created\n",
    "- then return the overlay as in each of the simpler versions of `viewable` above\n",
    "\n",
    "One wrinkle here is that by default the `rasterize` operation automatically attaches the `RangeXY` stream parameters of the plot to the DynamicMap it creates, to make it responsive to changes in the visible viewport caused by zooming and panning. However, here we need to return the result of `rasterize` as the callback in a DynamicMap, which must be  a static element, so we pass `rasterize(... dynamic=False)` and then explicitly attach the `RangeXY` stream to the `rasterized` DynamicMap later.\n",
    "\n",
    "As if all that weren't confusing enough, here we had to use three different ways of making a DynamicMap: \n",
    "1. Creating one directly: `hv.DynamicMap(self.callbackmethod)`: makes the result of a callback displayable on updates\n",
    "2. Wrapping an existing DynamicMap with an operation (`rasterize`, `shade`, `spread`, etc.): chains an operation on top of the output of something already dynamic, optionally attaching \"streams\" to control the stage-specific parameters dynamically\n",
    "3. Using `hv.util.Dynamic`: applies a method to the given object, controlled by supplied streams\n",
    "\n",
    "As you can see, we had to use some esoteric features of HoloViews, but we were able to precisely characterize which bits of the computation need to be re-run, providing maximal responsiveness where possible (try dragging the opacity sliders or selecting colormaps), while re-running everything when needed (when aggregation-related parameters change). In many cases you can use much simpler approaches than were needed here, as we were able to do for the map tiles and labels above."
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
